Hi,
> Using COPY for the init load and insert for subsequent handling. Can't use
> COPY there cause there are WHERE clauses in there I need.
>
> I start off with 3.3GB of data and things just grow from that with a fresh
> 3GB file every day! Plenty of dup data in it though. Lots of complex
> queries get run on this data, which generates even more info... At the
> end of the initial run, I get ~50GB of data in the pg data dir (course
> that's got a lot of additional info than just a plain text file...) with
> ~10GB per subsequent run.
I would suggest to optimize your Application!
Try to:
- cache previously inserted IDs and other things
(eg. use Perl hashes)
- create simpler Where clauses
- look on your indexes, perhaps you can create an index on two columns?
And:
use vacuum analyze after the Database is freshly build and filled
with the first ~ 100 000 rows.
Later vacuum analyze every 1 000 000 or 10 000 000 rows ...
> BTW, I've got a dual proc machine with a RAID-0 array and 1 GB of RAM, but
> pg only uses one CPU at a time. Would have been great if it had been
> multi-threaded or something.
if you use two inserting processes, Postgres also should use two. AFAIK! :)
Ciao
Alvar
--
http://www.teletrust.info/
http://www.odem.org/ || http://www.odem.org/insert_coin/imkp2001.html
--
AGI :: Hohnerstrasse 23, 70469 Stuttgart
Fon +49 (0)711.490 320-0, Fax +49 (0)711.490 320-150
AGI auf Platz 3 im neuen Multimedia-Kreativranking
http://www.agi.de/tagebuch/